Skip to content

Conversation

ChrisPaulBennett
Copy link
Contributor

@ChrisPaulBennett ChrisPaulBennett commented Mar 12, 2025

This apart of 3 pull requests for adding CPU time and Max RSS analysis to the Cylc UI.

This adds the Max RSS and CPU time (as measured by cgroups) to the table view, box plot and time series views.

Linked to;
cylc/cylc-flow#6663
cylc/cylc-uiserver#675

Check List

  • I have read CONTRIBUTING.md and added my name as a Code Contributor.
  • Contains logically grouped changes (else tidy your branch by rebase).
  • Does not contain off-topic changes (use other PRs for other changes).
  • Applied any dependency changes to both setup.cfg (and conda-environment.yml if present).
  • Tests are included (or explain why tests are not needed).
  • Changelog entry included if this is a change that can affect users
  • Cylc-Doc pull request opened if required at cylc/cylc-doc/pull/XXXX.
  • If this is a bug fix, PR should be raised against the relevant ?.?.x branch.

@ChrisPaulBennett ChrisPaulBennett marked this pull request as draft March 12, 2025 09:38
@oliver-sanders oliver-sanders added this to the Pending milestone Mar 12, 2025
@MetRonnie

This comment was marked as resolved.

@MetRonnie
Copy link
Member

We'll get round to this when we can, but a little bit swamped at the moment!

@wxtim
Copy link
Member

wxtim commented Apr 16, 2025

I've had a first look -

The "chips" look good:

image

  • Is there a reason I don't get maxrss for the hpc background job?
  • Might these be more useful if the data was converted to human readable?

If nothing has finished the analysis view sticks in this state:

image

I threw a fairly nasty workflow at it and noticed some anomalies:

image

Cannot page through, cannot see tasks I know to have completed.

@ChrisPaulBennett
Copy link
Contributor Author

Oliver recommended the best way to move data around cylc was to use a debug message. Those chips only only appear in dev mode and won't be visible to normal users.
What's a HPC background job?
The analysis view being stuck in that state was introduced when I merged in @oliver-sanders changes to SQL. I haven't figured out what is causing this yet. Clicking refresh will solve the issue.

@wxtim
Copy link
Member

wxtim commented Apr 23, 2025

What's a HPC background job?

Using the supercomputer login node as just another computer. We use the term background to mean execution a job in bash without using a job runner like PBS or Slurm.

E.g. > ssh exab bash program.sh rather than ssh exab qsub program.sh

Warning

Irrelevant
@oliver-sanders pointed out to me after I posted the original comment that background jobs don't use cgroups in the same way, so the profiler probably won't work for them anyway.

@oliver-sanders
Copy link
Member

oliver-sanders commented Apr 23, 2025

Is there a reason I don't get maxrss for the hpc background job?

  1. You would need to configure this platform to run the profiler.
  2. There are no cgroups to poll for background jobs so it wouldn't work if you tried.

Those chips only only appear in dev mode and won't be visible to normal users.

Debug is def the right level to send these messages at, however, I'm not sure we've got the UI to hide debug level messages yet. Orthogonal to this PR either way.

@oliver-sanders oliver-sanders added the analysis views Work pertaining to the analysis/gantt/etc views label May 20, 2025
Comment on lines 124 to 135
// If memory value passed
} else if (timingOption === 'maxRss') {
if (value < 5000) {
const kilobytes = value
return kilobytes.toPrecision(3) + ' KB'
} else if (value / 1024 < 1000) {
const megabytes = value / 1024
return megabytes.toPrecision(3) + ' MB'
} else {
const gigabytes = value / 1048576
return gigabytes.toPrecision(3) + ' GB'
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this ought to be in a different function as it's not a duration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think the name of the function should be change. Originally, when we only had timing on the analysis view, this function displayed the timings in a sensible format.
It's ensuring the appropriate scale, for the appropriate unit is displayed on the Y-axis. Wouldn't make sense to have two functions doing that.
formatUnits() ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatDuration is used by other components such as the Tree and Table views, so it needs to stay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I get you?
The the function won't change. I'm just changing the name to better reflect it's purpose

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its purpose should remain as formatting durations. I would add a new function for formatting RSS. If you want you could also have a function like

function formatUnits (...) {
  if (duration) formatDuration(...)
  else formatRSS(...)
}

Comment on lines 124 to 135
// If memory value passed
} else if (timingOption === 'maxRss') {
if (value < 5000) {
const kilobytes = value
return kilobytes.toPrecision(3) + ' KB'
} else if (value / 1024 < 1000) {
const megabytes = value / 1024
return megabytes.toPrecision(3) + ' MB'
} else {
const gigabytes = value / 1048576
return gigabytes.toPrecision(3) + ' GB'
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatDuration is used by other components such as the Tree and Table views, so it needs to stay.

{ title: `Min ${this.timingOption} time`, value: `min${upperFirst(this.timingOption)}Time` },
{ title: `Max ${this.timingOption} time`, value: `max${upperFirst(this.timingOption)}Time` },
{ title: `Mean ${formatChartLabels(this.timingOption)}`, value: `mean${upperFirst(getTimingOption(this.timingOption))}` },
{ title: `Median ${formatChartLabels(this.timingOption)}`, value: `median${upperFirst(getTimingOption(this.timingOption))}` },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what is happening here but in the offline mode, sorting by median time is not working for any metric

Image

Possibly missing from offline data?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analysis views Work pertaining to the analysis/gantt/etc views
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants